Goto

Collaborating Authors

 epoch number





Appendix A Derivation of (3) Based on the fact that the θ (m) is satisfied with the stationary condition of the lower-level objective function in (2), we obtain

Neural Information Processing Systems

Masks may vary between each iteration, and the pruned weights are indicated using the light gray color. Different colors of the edges in the neural networks refer to the weight update. The initial learning rate for all the methods are 0.1. All the evaluations are based on a single Tesla-V100 GPU. P do not require additional epochs for retraining.



Improving Accuracy-robustness Trade-off via Pixel Reweighted Adversarial Training

Zhang, Jiacheng, Liu, Feng, Zhou, Dawei, Zhang, Jingfeng, Liu, Tongliang

arXiv.org Artificial Intelligence

Adversarial training (AT) trains models using adversarial examples (AEs), which are natural images modified with specific perturbations to mislead the model. These perturbations are constrained by a predefined perturbation budget $\epsilon$ and are equally applied to each pixel within an image. However, in this paper, we discover that not all pixels contribute equally to the accuracy on AEs (i.e., robustness) and accuracy on natural images (i.e., accuracy). Motivated by this finding, we propose Pixel-reweighted AdveRsarial Training (PART), a new framework that partially reduces $\epsilon$ for less influential pixels, guiding the model to focus more on key regions that affect its outputs. Specifically, we first use class activation mapping (CAM) methods to identify important pixel regions, then we keep the perturbation budget for these regions while lowering it for the remaining regions when generating AEs. In the end, we use these pixel-reweighted AEs to train a model. PART achieves a notable improvement in accuracy without compromising robustness on CIFAR-10, SVHN and TinyImagenet-200, justifying the necessity to allocate distinct weights to different pixel regions in robust classification.


Balance is Essence: Accelerating Sparse Training via Adaptive Gradient Correction

Lei, Bowen, Xu, Dongkuan, Zhang, Ruqi, He, Shuren, Mallick, Bani K.

arXiv.org Artificial Intelligence

Despite impressive performance, deep neural networks require significant memory and computation costs, prohibiting their application in resource-constrained scenarios. Sparse training is one of the most common techniques to reduce these costs, however, the sparsity constraints add difficulty to the optimization, resulting in an increase in training time and instability. In this work, we aim to overcome this problem and achieve space-time co-efficiency. To accelerate and stabilize the convergence of sparse training, we analyze the gradient changes and develop an adaptive gradient correction method. Specifically, we approximate the correlation between the current and previous gradients, which is used to balance the two gradients to obtain a corrected gradient. Our method can be used with the most popular sparse training pipelines under both standard and adversarial setups. Theoretically, we prove that our method can accelerate the convergence rate of sparse training. Extensive experiments on multiple datasets, model architectures, and sparsities demonstrate that our method outperforms leading sparse training methods by up to \textbf{5.0\%} in accuracy given the same number of training epochs, and reduces the number of training epochs by up to \textbf{52.1\%} to achieve the same accuracy. Our code is available on: \url{https://github.com/StevenBoys/AGENT}.


Revisiting Personalized Federated Learning: Robustness Against Backdoor Attacks

Qin, Zeyu, Yao, Liuyi, Chen, Daoyuan, Li, Yaliang, Ding, Bolin, Cheng, Minhao

arXiv.org Artificial Intelligence

In this work, besides improving prediction accuracy, we study whether personalization could bring robustness benefits to backdoor attacks. We conduct the first study of backdoor attacks in the pFL framework, testing 4 widely used backdoor attacks against 6 pFL methods on benchmark datasets FEMNIST and CIFAR-10, a total of 600 experiments. The study shows that pFL methods with partial model-sharing can significantly boost robustness against backdoor attacks. In contrast, pFL methods with full model-sharing do not show robustness. To analyze the reasons for varying robustness performances, we provide comprehensive ablation studies on different pFL methods. Based on our findings, we further propose a lightweight defense method, Simple-Tuning, which empirically improves defense performance against backdoor attacks. We believe that our work could provide both guidance for pFL application in terms of its robustness and offer valuable insights to design more robust FL methods in the future. We open-source our code to establish the first benchmark for black-box backdoor attacks in pFL: https://github.com/alibaba/FederatedScope/tree/backdoor-bench.


Supervising the Multi-Fidelity Race of Hyperparameter Configurations

Wistuba, Martin, Kadra, Arlind, Grabocka, Josif

arXiv.org Artificial Intelligence

Multi-fidelity (gray-box) hyperparameter optimization techniques (HPO) have recently emerged as a promising direction for tuning Deep Learning methods. However, existing methods suffer from a sub-optimal allocation of the HPO budget to the hyperparameter configurations. In this work, we introduce DyHPO, a Bayesian Optimization method that learns to decide which hyperparameter configuration to train further in a dynamic race among all feasible configurations. We propose a new deep kernel for Gaussian Processes that embeds the learning curve dynamics, and an acquisition function that incorporates multi-budget information. We demonstrate the significant superiority of DyHPO against state-of-the-art hyperparameter optimization methods through large-scale experiments comprising 50 datasets (Tabular, Image, NLP) and diverse architectures (MLP, CNN/NAS, RNN).


A framework for the emergence and analysis of language in social learning agents

Wieczorek, Tobias J., Tchumatchenko, Tatjana, Carvajal, Carlos Wert, Eggl, Maximilian F.

arXiv.org Artificial Intelligence

Artificial neural networks (ANNs) are increasingly used as research models, but questions remain about their generalizability and representational invariance. Biological neural networks under social constraints evolved to enable communicable representations, demonstrating generalization capabilities. This study proposes a communication protocol between cooperative agents to analyze the formation of individual and shared abstractions and their impact on task performance. This communication protocol aims to mimic language features by encoding high-dimensional information through low-dimensional representation. Using grid-world mazes and reinforcement learning, teacher ANNs pass a compressed message to a student ANN for better task completion. Through this, the student achieves a higher goal-finding rate and generalizes the goal location across task worlds. Further optimizing message content to maximize student reward improves information encoding, suggesting that an accurate representation in the space of messages requires bi-directional input. This highlights the role of language as a common representation between agents and its implications on generalization capabilities.